informative feature
Biconvex Biclustering
Rosen, Sam, Chi, Eric C., Xu, Jason
This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Minnesota (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.66)
- Information Technology > Biomedical Informatics > Translational Bioinformatics (0.86)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
Tabular foundation models (TFMs) such as TabPFN (Tabular Prior-Data Fitted Network) are designed to generalize across heterogeneous tabular datasets through in-context learning (ICL). They perform prediction in a single forward pass conditioned on labeled examples without dataset-specific parameter updates. This paradigm is particularly attractive in industrial domains (e.g., finance and healthcare) where tabular prediction is pervasive. Retraining a bespoke model for each new table can be costly or infeasible in these settings, while data quality issues such as irrelevant predictors, correlated feature groups, and label noise are common. In this paper, we provide strong empirical evidence that TabPFN is highly robust under these sub-optimal conditions. We study TabPFN and its attention mechanisms for binary classification problems with controlled synthetic perturbations that vary: (i) dataset width by injecting random uncorrelated features and by introducing nonlinearly correlated features, (ii) dataset size by increasing the number of training rows, and (iii) label quality by increasing the fraction of mislabeled targets. Beyond predictive performance, we analyze internal signals including attention concentration and attention-based feature ranking metrics. Across these parametric tests, TabPFN is remarkably resilient: ROC-AUC remains high, attention stays structured and sharp, and informative features are highly ranked by attention-based metrics. Qualitative visualizations with attention heatmaps, feature-token embeddings, and SHAP plots further support a consistent pattern across layers in which TabPFN increasingly concentrates on useful features while separating their signals from noise. Together, these findings suggest that TabPFN is a robust TFM capable of maintaining both predictive performance and coherent internal behavior under various scenarios of data imperfections.
Starting Off on the Wrong Foot: Pitfalls in Data Preparation
Guo, Jiayi, Dong, Panyi, Quan, Zhiyu
When working with real-world insurance data, practitioners often encounter challenges during the data preparation stage that can undermine the statistical validity and reliability of downstream modeling. This study illustrates that conventional data preparation procedures such as random train-test partitioning, often yield unreliable and unstable results when confronted with highly imbalanced insurance loss data. To mitigate these limitations, we propose a novel data preparation framework leveraging two recent statistical advancements: support points for representative data splitting to ensure distributional consistency across partitions, and the Chatterjee correlation coefficient for initial, non-parametric feature screening to capture feature relevance and dependence structure. We further integrate these theoretical advances into a unified, efficient framework that also incorporates missing-data handling, and embed this framework within our custom InsurAutoML pipeline. The performance of the proposed approach is evaluated using both simulated datasets and datasets often cited in the academic literature. Our findings definitively demonstrate that incorporating statistically rigorous data preparation methods not only significantly enhances model robustness and interpretability but also substantially reduces computational resource requirements across diverse insurance loss modeling tasks. This work provides a crucial methodological upgrade for achieving reliable results in high stakes insurance applications.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Banking & Finance > Insurance (0.48)
- Health & Medicine (0.46)
Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.15)
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Maryland (0.04)
- North America > Canada (0.04)